Picture for Xin Tao

Xin Tao

VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization

Add code
Jun 01, 2026
Viaarxiv icon

SRC-Flow: Compact Semantic Representations Enable Normalizing Flows for Image Generation

Add code
May 18, 2026
Viaarxiv icon

Amodal SAM: A Unified Amodal Segmentation Framework with Generalization

Add code
Apr 22, 2026
Viaarxiv icon

Stable Velocity: A Variance Perspective on Flow Matching

Add code
Feb 05, 2026
Viaarxiv icon

VMonarch: Efficient Video Diffusion Transformers with Structured Attention

Add code
Jan 29, 2026
Viaarxiv icon

SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer

Add code
Jan 23, 2026
Viaarxiv icon

CamPilot: Improving Camera Control in Video Diffusion Model with Efficient Camera Reward Feedback

Add code
Jan 22, 2026
Viaarxiv icon

A Mechanistic View on Video Generation as World Models: State and Dynamics

Add code
Jan 22, 2026
Viaarxiv icon

Alchemist: Unlocking Efficiency in Text-to-Image Model Training via Meta-Gradient Data Selection

Add code
Dec 18, 2025
Viaarxiv icon

StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors

Add code
Dec 18, 2025
Figure 1 for StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors
Figure 2 for StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors
Figure 3 for StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors
Figure 4 for StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors
Viaarxiv icon